Fine-Tuning an LLM for Chrysler Crossfire Troubleshooting

A Capstone Project

Dre Dyson

2025-04-22

Introduction

  • Presenter: Dre Dyson, Graduate Student, University of West Florida.
  • Project: Capstone focused on fine-tuning a Large Language Model (LLM).
  • Goal: Create a specialized AI assistant for troubleshooting and repairing the Chrysler Crossfire.

Fine-Tuning a Large Language Model

Fine-Tuning an LLM for Chrysler Crossfire Troubleshooting

Dre Dyson University of West Florida

Brief Intro: “Hi everyone. My name is Dre Dyson… today, I want to share my Capstone project with you. It’s about fine-tuning a Large Language Model… that focuses on troubleshooting and repairing the Chrysler Crossfire.”

Why This Project?

Chrysler Crossfire

  • Personal Connection: I own and love my Chrysler Crossfire.
  • The Challenge: It’s nearly 20 years old – like many older cars, issues arise.
  • The Spark (Personal Story):
    • Sudden unintended acceleration on the highway.
    • Scary experience: foot off gas, brakes applied, car still accelerating.
    • Managed neutral (RPMs redlined), shut off, coasted to safety.
    • Problem vanished temporarily upon restart.
    • Frustration: Online searches, Facebook groups yielded no clear answers for days. Stayed off highways for safety.

The “Aha!” Moment

  • New Symptom (Weeks Later): Intermittent brake lights (sometimes worked, sometimes not).
  • Troubleshooting: Checked bulbs, fuses, wiring – all seemed okay.
  • The Culprit: Faulty brake light switch.
  • The Breakthrough: Replacing the switch fixed the brake lights AND the previous acceleration issue!
  • The Idea:
    • What if a dedicated AI, like ChatGPT but for my car, existed?
    • Could it help diagnose complex, linked issues faster and more accurately?
    • Could I build something like that? Yes.

Methods (4-Step Process)

4-step process graphic: Collect, Generate, Train, Test

  1. Collect: Gather real-world Chrysler Crossfire data (forums, guides).
  2. Generate: Transform collected data into high-quality Q&A training pairs.
  3. Train: Fine-tune a base LLM using the specialized Crossfire dataset.
  4. Test: Evaluate the fine-tuned model’s performance on Crossfire-specific questions.

Step 1 - Collecting the Data

  • Objective: Source real-world Crossfire problems and solutions.
  • Method: Automated scraping script targeting popular Crossfire forums.
  • Sources:
    • Troubleshooting threads
    • Repair guides (DIY)
    • Common issue discussions
  • Initial Data Volume: ~60,000+ forum posts, ~32 DIY PDF guides.
  • Refinement: Sampled down to ~25,000 relevant posts for manageability.

Step 2 - Generating Synthetic Data

  • Challenge: Raw forum data is unstructured and noisy.
  • Solution: Used AugmentedToolkit (ATK).
    • Leverages Python and AI.
    • Reads raw text (posts, PDFs).
    • Transforms content into structured, high-quality Question/Answer pairs.
  • Outcome: Generated over 8,000 Crossfire-specific Q&A pairs suitable for model training.
Metrics Fine-tuning Dataset
No. of dialogues 8385
Total no. of turns 79663
Avg. turns per dialogue 9.5
Avg. tokens per turn 41.62
Total unique tokens 70165

Table 1: Summarizes the characteristics of the conversational dataset generated by ATK.

Step 3 - Fine-Tuning the Model

  • Objective: Teach a base LLM the collected Crossfire knowledge.
  • Tool: Unsloth (for faster, memory-efficient fine-tuning).
  • Base Model: Meta’s Llama 3.1 8B Instruct (powerful, open-source).
  • Dataset: Loaded the 8,000+ Crossfire Q&A pairs (detailed in Table 1).
  • Process: Supervised Fine-Tuning (SFT).
    • Epochs: 3
    • Steps: ~3,000
    • Learning Rate: 1E-4
  • Duration: Approx. 8 hours.
  • Result: A Llama 3.1 8B model fine-tuned specifically on Chrysler Crossfire knowledge.

Results - Training Validation

1. Training Validation: Did the Model Learn?

  • The train/loss plot shows how well the model absorbed the Crossfire data during fine-tuning.
  • Observations:
    • Loss starts ~1.4-1.6, drops rapidly early on (fast initial learning).
    • Later, the drop slows, with some fluctuations (normal for batch training).
    • Towards the end (~3,000 steps), loss settles between 0.8-1.0.
  • Conclusion: The decreasing trend confirms the fine-tuning was effective; the model successfully learned from the specialized dataset.

Weights and Biases Training Loss Plot showing decreasing loss over training steps Training Loss Curve

Results - Testing

2. Testing: How Did It Perform vs. Base Models?

  • Method: Compared Fine-Tuned 8B (“Chrysler Crossfire Model”) vs. standard Llama 3.1 (8B, 70B, 405B) Instruct on 5 specific Crossfire questions.
  • Example Questions Addressed:
    • “What type of battery should I use for my Chrysler Crossfire?” (Battery Type)
    • “What’s the stock front wheel size?” (Front Wheel Size)
    • “What headlight model does the Crossfire use?” (Headlight Model)
    • “What’s the stock rear wheel size?” (Rear Wheel Size)
    • “How do I perform a throttle reset?” (Throttle Reset Proc.)

Comparison Table:

Question Category Battery Type Front Wheel Size Headlight Model Rear Wheel Size Throttle Reset Proc.
Model
Chrysler Crossfire Model Correct Correct Correct Correct Correct
Llama 3.1 8B Incorrect Incorrect Incorrect Incorrect Incorrect
Llama 3.1 70B Incorrect Incorrect Incorrect Incorrect Incorrect
Llama 3.1 405B Incorrect Correct Incorrect Incorrect Incorrect

Accuracy Table:

Correct Answers Total Questions Accuracy (%)
Model
Chrysler Crossfire Model 5 5 100
Llama 3.1 8B 0 5 0
Llama 3.1 70B 0 5 0
Llama 3.1 405B 1 5 20
  • Key Takeaway: Fine-tuning demonstrably imparted specific knowledge, significantly outperforming even much larger base models on these niche questions, achieving 100% accuracy compared to 0-20% for the others.

Conclusion

  • Achievement: Successfully fine-tuned a general LLM (Llama 3.1 8B) into a specialized Chrysler Crossfire expert.
  • Process Recap:
    1. Collected thousands of real-world forum posts & guides.
    2. Transformed unstructured text into a high-quality Q&A dataset using ATK (See Table 1).
    3. Fine-tuned the model using Unsloth and the specialized dataset.
    4. Tested against larger base models (See Tables 3 & 4).
  • Key Finding: The targeted fine-tuned model significantly outperformed even vastly larger general models (like the 405B) on niche Crossfire questions.
  • Implication: Training smaller, specialized models with high-quality, domain-specific data is highly effective for niche applications.

References

  1. Armstrong, E., cocktailpeanut, darkacorn, Etherll, Teles, A. (afterSt0rm), abasgames, juanjopc, & RyanGreenup. (2024). e-p-armstrong/augmentoolkit: Augmentoolkit 2.0 (Version 2.0.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13755901
  2. Armstrong, E. P. (2023). Augmentoolkit [Computer software]. GitHub. https://github.com/e-p-armstrong/augmentoolkit
  3. Han, D., Han, M., & the Unsloth team. (2023). Unsloth [Computer software]. GitHub. https://github.com/unslothai/unsloth
  4. Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., … Zhao, L. (2024). Domain specialization as the key to make large language models disruptive: A comprehensive survey [Preprint]. arXiv. https://arxiv.org/abs/2305.18703
  5. Meta. (2024). Llama 3.1 8B [Computer software]. Hugging Face. https://huggingface.co/meta-llama/Llama-3.1-8B
  6. von Werra, L., Belkada, Y., Tunstall, L., Beeching, E., Thrush, T., Lambert, N., Huang, S., Rasul, K., & Gallouédec, Q. (2020). TRL: Transformer Reinforcement Learning [Computer software]. GitHub. https://github.com/huggingface/trl
  7. Weights & Biases. (2025). WandB Sweeps [Computer software]. https://docs.wandb.ai/guides/sweeps/